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1.  Introduction 


A  common  problem  faced  by  an  experimenter  is  one  of  comparing  several  populations 
(processes,  treatments).  These  may  be,  for  example,  different  varieties  of  a  grain  or  dif¬ 
ferent  drugs  for  a  specific  disease.  In  other  words,  we  have  k(>  2)  populations  and  each 
population  is  characterized  by  the  value  of  a  parameter  of  interest,  say,  0,  which  may  be,  in 
the  example  of  drugs,  an  appropriate  measure  of  the  effectiveness  of  a  drug.  The  classical 
approach  to  this  problem  is  to  test  the  homogeneity  hypothesis  H0  :  0\  =  02  =  •  •  •  =  0k, 

where  0\,...,0k  are  the  values  of  the  parameter  for  these  populations.  However,  the  clas¬ 
sical  tests  of  homogeneity  are  inadequate  in  the  sense  that  they  do  not  answer  a  frequently 
encountered  experimenter’s  question,  namely,  how  to  identify  the  “best”  population  or 
how  to  select  the  more  promising  (worthwhile)  subset  of  the  populations  for  further  ex¬ 
perimentation. 

The  formulation  of  a  k-sample  problem  as  a  multiple  decision  problem  enables  the 
experimenter  to  answer  questions  regarding  the  selection  of  the  best  or  a  subset  containing 
the  best  population.  The  formulation  of  multiple  decision  procedure  in  the  framework  of 
selection  and  ranking  procedures  has  been  accomplished  generally  using  either  the  indiffer¬ 
ence  zone  approach  or  the  subset  selection  approach.  The  former  approach  was  introduced 
by  Bechhofer  (1954).  Substantial  contribution  to  the  early  and  subsequent  developments 
in  the  subset  selection  theory  has  been  made  by  Gupta  (1956,  1965).  A  discussion  of  their 
differences  and  various  modifications  that  have  taken  place  since  then  can  be  found  in 
Gupta  and  Panchapakesan  (1979). 

In  many  situations,  an  experimenter  may  have  some  prior  information  about  the  pa¬ 
rameters  of  interest,  and  he  would  like  to  use  this  information  to  make  an  appropriate 
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decision.  In  this  sense,  the  classical  ranking  and  selection  procedures  may  seem  conserva¬ 
tive  if  the  prior  information  has  not  been  considered.  If  the  information  at  hand  can  be 
quantified  into  a  single  prior  distribution,  one  would  like  to  apply  a  Bayes  procedure  since 
it  achieves  the  minimum  of  Bayes  risks  among  a  class  of  decision  procedures.  In  his  recent 
book,  Berger  (1985)  discusses  several  approaches  to  select  a  prior  distribution  based  on 
the  information  at  hand.  Some  contributions  to  multiple  decision  problems  using  Bayesian 
approach  have  been  made  by  Bickel  and  Yahav  (1977),  Chernoff  and  Yahav  (1977),  Deely 
and  Gupta  (1968),  Goel  and  Rubin  (1977),  Gupta  and  Hsiao  (1981),  Gupta  and  Miescke 
(1984),  Gupta  and  Yang  (1985),  Berger  and  Deely  (1986),  Guttman  and  Tiao  (1964),  Mi¬ 
escke  (1979)  and  Roth  (1978),  among  others.  Readers  are  referred  to  Box  and  Tiao  (1973) 
and  Berger  (1985)  for  general  Bayesian  inference  in  statistical  analysis. 

However,  it  is  usually  difficult,  perhaps  impossible,  to  quantify  the  prior  information 
through  a  single  prior.  Therefore,  it  is  suggested,  (for  example,  see  Robbins  (1964)),  that 
the  prior  information  is  quantified  through  a  class  T  of  subjectively  plausible  priors.  Blum 
and  Rosenblatt  (1967)  and  Berger  and  Berliner  (1986)  have  used  this  idea  in  statistical 
inference.  One  of  the  approaches,  through  the  consideration  of  a  class  T  of  subjectively 
plausible  priors,  is  the  so-called  r-minimax  approach.  One  would  like  to  apply  the  T- 
minimax  procedure  which  minimizes  the  supremum  of  the  Bayes  risk  over  the  class  T  of 
priors.  Some  contributions  to  multiple  decision  problems  using  this  criterion  have  been 
made  by  Gupta  and  Hsiao  (1981),  Gupta  and  Huang  (1977),  Gupta  and  Kim  (1980),  Huang 
and  Tseng  (1983),  Miescke  (1981)  and  Randles  and  Hollander  (1971).  Also,  Deely  (1965) 
studied  some  selection  problems  through  empirical  Bayes  approach  assuming  that  the  prior 
distribution  belongs  to  a  class  of  distributions  with  some  unknown  hyperparameters. 
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The  empirical  Bayes  approach  in  statistical  decision  theory  is  appropriate  when  one 
is  confronted  repeatedly  and  independently  with  the  same  decision  problem.  In  such  in¬ 
stances,  it  is  reasonable  to  formulate  the  component  problem  with  respect  to  an  unknown 
(or  partially  known)  prior  distribution  on  the  parameter  space.  One  then  uses  the  accumu¬ 
lated  observations  to  improve  the  decision  procedure  at  each  stage.  This  approach  is  due 
to  Robbins  (1956,  1964,  1983).  Empirical  Bayes  procedures  have  been  derived  for  multiple 
decision  problems  by  Deely  (1965)  for  selecting  a  subset  containing  the  best  population. 
Van  Ryzin  (1970),  Huang  (1975),  Van  Ryzin  and  Susarla  (1977)  also  studied  other  multi¬ 
ple  decision  problems  by  using  the  empirical  Bayes  approach.  Recently,  Gupta  and  Hsiao 
(1983)  and  Gupta  and  Leu  (1983)  have  studied  empirical  Bayes  procedures  for  selecting 
good  populations  with  respect  to  a  standard  or  a  control.  Gupta  and  Liang  (1984,  1986) 
have  studied  empirical  Bayes  procedures  for  the  problem  of  selecting  the  best  population 
or  selecting  populations  better  than  a  standard  or  a  control  with  underlying  populations 
being  binomially  distributed.  Many  such  empirical  Bayes  procedures  have  been  shown 
asymptotically  optimal  in  the  sense  that  the  risk  for  the  n-th  decision  problem  converges 
to  the  optimal  Bayes  risk  which  would  have  been  obtained  if  the  prior  distribution  was 
fully  known  and  the  Bayes  procedure  with  respect  to  this  prior  distribution  was  used. 

In  the  present  paper,  we  describe  selection  and  ranking  procedures  using  prior  dis¬ 
tributions  or  using  the  information  contained  in  the  past  data.  Section  2  of  this  paper 
deals  with  the  problem  of  selecting  the  best  population  through  Bayesian  approach.  An 
essentially  complete  class  is  obtained  for  a  class  of  reasonable  loss  functions.  We  also 
discuss  Bayes-P*  selection  procedures  which  are  better  than  the  classical  subset  selection 
procedures  in  terms  of  the  size  of  selected  subset.  In  Section  3,  we  first  set  up  a  general 
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formulation  of  empirical  Bayes  framework  for  selection  and  ranking  problems.  Several  em¬ 
pirical  Bayes  frameworks  are  discussed  based  on  the  underlying  statistical  models.  Two 
selection  problems  dealing  with  binomial  and  uniform  populations  are  discussed  in  detail. 

2.  Bayesian  Approach 

2.1.  Notations  and  Formulation  of  the  Selection  Problem 

Let  0t'60  C  R  denote  the  unknown  characteristic  of  interest  associated  with  popula¬ 
tion  7Ti,  i  ~  1, . . . , k.  Let  Xj, . . . ,  X/c  be  random  variables  representing  the  k  populations 
TL’j  i —  1, . . . ,  /c,  respectively,  with  Xt  having  the  probability  density  function  (or  probabil¬ 
ity  frequency  function  in  discrete  case)  /t*(x|0t).  In  many  cases,  X;  is  a  sufficient  statistic 

for  $i.  It  is  assumed  that  given  6  =  (0*,.. . ,  0^)  5  X  —  (Xi,  •  •  •  ,  X/-)  have  a  joint  probability 

k 

density  function  f(x\$)  =  []  fi{xi\0i),  where  x  =  (xu . . . ,  xk).  Let  0{1]  <  0[2]  <  . . .  <  0[k] 

t=i 

denote  the  ordered  values  of  ^^’s  and  let  'R\%\  denote  the  unknown  population  associated 
with  0^.  The  population  7T[^]  will  be  called  the  best  population.  If  there  are  more  than 
one  population  satisfying  this  condition,  we  arbitrarily  tag  one  of  them  and  call  it  the  best 
one.  Also,  we  let  fi  =  {0|0,e©,t  =  l,...,fc}  denote  the  parameter  space;  also  denote  by 
G(-)  a  prior  distribution  on  8  over  Q. 

Let  A  be  the  action  space  consisting  of  all  the  2fc  —  1  nonempty  subset  of  the  set 
{1, . . . ,  k}.  When  action  S  is  taken,  we  mean  that  population  7rt-  is  included  in  the  selected 
subset  if  icS.  For  each  6e Q  and  SeA,  let  L(6,S)  denote  the  loss  incurred  when  8  is  the 
true  state  of  nature  and  the  action  S  is  taken.  A  decision  procedure  d  is  defined  to  be  a 
mapping  from  Ixil  into  [0,  l],  where  X  is  the  sample  space  of  X  =  (Xi, . . . ,  X^).  That 
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is,  for  xeX  and  SeA,  d[x ,  S)  is  the  probability  of  taking  action  S  when  X  =  x  is  observed. 
Let  D*  be  the  set  of  all  decision  procedures  d(x,S). 

For  each  deD* ,  let  B(d,G)  denote  the  associated  Bayes  risk.  That  is, 

(2.1)  B(d,  G)  =  f  [  S'd{x,S)L{0,S)f{x\0)dxdG[6). 

JnJxSeA 

Then,  B(G)  =  inf.  B(d,G)  is  the  minimum  Bayes  risk. 

An  optimal  decision  procedure,  denoted  by  do,  is  obtained  if  da  has  the  property  that 

(2.2)  B{da,G)=B[G). 


Such  a  procedure  is  called  Bayes  with  respect  to  G.  Under  some  regularity  conditions,  we 
can  write  (2.1)  as 


(2.3) 


Now  Let 


(2.4) 


B{d,G)  =  [  y'd(x.S)  f  L[0,S)f(x\0)dG(9)dx. 

J  SeA  Jn  “  "  ~ 


ra(x,S)  =  f  L{0,S)f{x\6)dG{0), 
Jn 


and 


(2.5)  Ag(?)  =  {Secret?,  S)  =  min  ra(x,  S')}. 

S*  €  A 

Then,  a  sufficient  condition  for  (2.2)  is  that  do  satisfies 


(2.6) 


^2  dG(x,S)  =  1. 

SeAG(X) 
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2.2.  An  Essentially  Complete  Class  of  Decision  Procedures 


In  this  subsection,  we  consider  a  class  of  loss  functions  possessing  the  following  prop¬ 
erties: 

Let  H  denote  the  group  of  all  permutations  of  the  components  of  a  /^-component 
vector. 

Definition  2.1:  A  loss  function  L  has  property  T  if 

(a)  L{0,S)  =  L{h6^hS)  for  all  6eQ,SeA  and  heH,  and 

(b)  L(0,S')  <  L(0,S)  if  the  following  holds  for  each  pair  (i,j)  with  0t-  <  6j  :  ieS,jgS 
and  S'  =  (S  -  {t})  U  {j}. 

The  property  (a)  assures  the  invariance  under  permutation  and  property  (b)  assures 
the  monotonicity  of  the  loss  function.  In  many  situations,  a  loss  function  satisfying  these 
assumptions  seems  quite  natural.  Some  examples  of  such  a  loss  function  are: 

(2.7a)  L(0,S)  =  o|5|  +  [0^]  —  0^]  (Goel  and  Rubin  (1977)); 

(2.7b)  L(9,S )  =  jTj  X)(0[*:]  —  Oj)  +  bl{ew>es}  (Bickel  and  Yahav  (1977)); 

jeS 

(2.7c)  L(0,S )  =  c(0[j.]  —  0S )  —  Oj  (Chernoff  and  Yahav  (1977)); 

jeS 

(2.7d)  L(9,S)  —  l^l  +  e^{e|t)>0s}  (Gupta  and  Hsu  (1978)); 

(2.7e)  L(9,S)  =  o;(|5'|)(0[fc]  —  Oj)  (Deely  and  Gupta  (1968)); 

jeS 
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where,  |S|  denotes  the  cardinality  of  the  subset  S^Os  —  max0j, a,  6,c  and  e  are  positive 

jeS 

constants,  a(-)  is  a  positive  function  on  the  set  {1,2,...,  k}  and  IA  denotes  the  indicator 
function  of  set  A. 

Note  that  different  loss  functions  have  different  interpretations.  For  further  discussion, 
see  Gupta  and  Hsu  (1978). 

We  now  let  <  X(2)  <  ...  <  X(k)  denote  the  ordered  observations.  Here  the  (i) 

can  be  viewed  as  the  (unknown)  index  of  the  population  associated  with  the  observation 

£(i).  For  each  j  =  1, . . . , fc,  let  Sj  =  {(fc), . . . ,  (A;  —  j  +  1)},  and  the  remaining  subsets  Sj 

be  associated  one-to-one  with  j  =  k  +  1, . . .  ,2fc  -  1,  arbitrarily.  Also,  let  Am  =  {5'e^||5'|  = 

k 

m},m  =  1, . . .  ,/c,  and  D\  =  {deD* |  £  d(x,Sj)  =  1  for  all  xeX}. 

j=i 

Theorem  2.1:  Suppose  that  —  /(xt|0i),  i  =  1  where  the  pdf  f(x\6 ) 

possesses  the  monotone  likelihood  ratio  (MLR)  property,  and  the  prior  distribution  G  is 
symmetric  on  ft.  Then, 

(a)  for  each  m  =  1, . . . ,  k,  rG(x,  Sm)  <  rG(x,  S)  for  all  SeAk-m+u  xeX,  and 

(b)  D\  is  an  essentially  complete  class  in  £)*; 
provided  that  the  loss  function  has  property  T. 

Proof:  The  proof  for  part  (a)  is  analogous  to  that  of  Theorem  3.3  of  Gupta  and 
Yang  (1985).  For  part  (b),  let  d  be  any  decision  procedure  in  D* .  Consider  the  decision 
procedure  d*  defined  as:  for  xcX, 

d  (x,  Sm)  =  ^ ^  5  ^  =  1, •  •  •  i 

ScAk-m+l 
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and 


d*{x,S)=0,  S  ^  Sm,  m  = 

Then,  d*eD\.  Also,  by  part  (a)  and  (2.3),  one  can  see  that 

B(d*,G)  <  B(d,G). 

Hence,  the  proof  of  part  (b)  is  completed. 

Let  A'g(x)  =  { iSy  1 1  <j<k ,  rc{x,Sj)  —  min  rc{x,  SA}.  Then,  under  the  condition 

l<t<fc 

of  Theorem  2.1,  any  Bayes  procedure  dc  satisfies  ^  dG(?>*S/)  =  1  for  xeX . 

SjeA'a(X) 

Goel  and  Rubin  (1977)  choose  the  loss  function  (2.7a)  and  study  the  behavior  of 
the  corresponding  Bayes  procedure  in  great  detail.  Bickel  and  Yahav  (1977)  assume  that 
0(i], . . . ,  0[£]  are  known  and  consider  the  loss  function  (2.7b).  They  obtain  the  best  invariant 
procedure  for  the  normal  pdf  and  then  depart  from  the  decision  theoretic  approach  to 
simplifying  this  procedure  as  k  — *•  oo.  Chernoff  and  Yahav  (1977)  consider  the  loss  function 
(2.7c)  and  compare  the  performance  of  the  Bayes  procedure  with  other  procedures  in  a 
“normal  model”  on  the  basis  of  Monte  Carlo  results. 

2.3.  Bayes  Procedures  wrt  Additive  Loss  Functions 

Deely  and  Gupta  (1968)  consider  the  loss  function  L(0,S)  corresponding  to  the  choice 
of  S  given  by 

(2.8)  = 

}tS 

Some  examples  of  such  a  loss  function  are: 
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(2.8a)  L(0,  S')  =  E  (^[fc]  ~  Oj )>  sum  of  losses; 

JtS 

(2.8b)  L(9,S)  =  E(0[k]  ~  %)>  average  loss; 

y«5 

(2.8c)  L(0,  S)  =  (k  +  1  -  \S\)  E  (*[*]  -  Oj). 

jtS 

Note  that  all  these  three  loss  functions  have  the  property  T. 

Deely  and  Gupta  (1968)  proved  that  when  asj  —  a  >  0  for  all  SeA  and  jeS ,  then  the 
Bayes  procedure  always  selects  exactly  one  population.  Miescke  (1979)  slightly  generalized 
the  result  of  Deely  and  Gupta  (1968).  He  considered  the  loss  function 

(2-9)  L(j?,SO  =  5>(|S|K,(0)> 

teS 

where  «(•)  is  nonnegative  function  on  the  set  {1, . . .  ,k}.  We  cite  his  result  as  follows: 

Theorem  2.2.  Let  ma(m)  >  a(l),m  =  1,2,...,/:.  If  the  s  are  nonnegative,  then 
there  exists  a  Bayes  procedure  which  always  selects  exactly  one  population. 

Theorem  2.2  does  not  hold  if  the  nonnegativity  of  £,•  for  all  i  =  1, ...  ,k  is  not  satisfied. 
For  example,  consider  the  loss  function 

(2.10)  X(0,S)  =  ]T(0[fcj-0t-e), 

ieS 

where  e  >  0  is  a  given  constant.  This  loss  function  can  be  used  for  the  problem  of  selecting 
populations  close  to  the  best.  With  this  loss  function,  it  is  possible  to  select  more  than 
one  population. 
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2.4.  Bayes-P*  Selection  Procedures 


44 


In  this  section,  we  continue  with  the  general  setup  of  Section  2.1. 

A  selection  procedure  ip  =  (V’i,  . . . ,  ipk)  is  defined  to  be  a  mapping  from  X  to  [0,  l]fc, 
where  ^i(x)  :  X  — *■  [0, 1]  is  the  probability  that  7rt-  is  included  in  the  selected  subset  when 
X  —  5  is  observed.  A  selection  procedure  is  called  nonrandomized  if  all  ipi’s  are  0  or 
1;  otherwise,  it  is  a  randomized  procedure.  A  correct  selection  (CS)  is  defined  to  be  the 
selection  of  any  subset  that  includes  the  best  population. 

Let  d  by  any  decision  procedure  considered  in  earlier  sections.  A  selection  procedure 
—  (V’i  5  •  •  •  5  V’*)  associated  with  d  can  be  obtained  by  letting 

(2-u) 

S3  i 

where  the  summation  being  over  all  the  subsets  containing  i. 

In  the  decision-theoretic  approach,  a  Bayes  decision  (selection)  procedure  always  pro¬ 
vides  a  decision  with  minimum  risk  under  a  certain  loss.  However,  since,  in  practice,  one 
always  has  the  difficulty  in  figuring  out  what  the  loss  may  be  and  the  Bayesian  result  is 
quite  sensitive  to  the  loss  used,  in  this  sense,  a  Bayes  procedure  does  not  mean  that  its 
quality  is  good  enough  to  pass  a  certain  level.  For  guaranteeing  the  quality  of  decision 
(selection)  procedures  one  would  like  to  have  a  “quality  control”  about  the  class  of  all 
possible  decision  (selection)  procedures.  That  is,  any  procedure  with  lower  quality  will  be 
removed,  even  though  it  might  be  the  cheapest  one  under  some  losses.  Analogous  to  clas¬ 
sical  subset  selection  approach,  Gupta  and  Yang  (1985)  set  up  a  control  condition  using 
the  Bayesian  approach. 
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Let 


(2.12)  Pi{%)  =  P{^i  is  the  best  \X  =  x)  =  P($i  is  the  largest  \X  =  x ) 

be  the  posterior  probability  that  population  7T ,•  is  the  best  population  when  X  =  x  is 
observed.  Then,  for  selection  procedure  ip,  the  posterior  probability  of  a  correct  selection 
given  X  =  x  is 

k 

(2.13)  P(CS\iP,X  =  x)  =  Y,Mz)Pi(z)- 

i=l 

Definition  2.2.  Given  a  number  P* ,  k~l  <  P*  <  1,  and  a  prior  G  on  H,  we  say  a  selection 
procedure  tp  satisfies  the  PP*-condition  (posterior  P*-condition)  if 

(a)  *Pi{x)  —  1  at  least  for  some  i,  1  <  i  <  kt  and 

(b)  P(CS\ip,X  =  x)  >  P*  for  all  xeX. 

k 

Note  that  ^  Pi{z)  =  1  for  all  xeX‘,  hence  this  kind  of  selection  procedures  always 

t=i 

exist.  We  let  C  =  C(P*)(C*  =  C*(P*))  be  the  class  of  all  nonrandomized  (randomized) 
selection  procedures  satisfying  the  PP*-condition. 

Let  pji^x)  <  ...  <  pjfc](i)  be  the  ordered  pt(x)’s  and  let  7T(,)  be  the  population 
associated  with  p[,-] (x) , i  =  1  Then  a  selection  procedure  ip  can  be  completely 

specified  by  {V>(i), . . . ,  i>(k)},  where 

(2.14)  V’(f)  (?)  =  P{n(i)  is  selected  | ip,  X  =  x},  i  =  1, . . . ,  k. 

Gupta  and  Yang  (1985)  proposed  two  selection  procedures;  one  is  nonrandomized,  say 
ipG  and  the  other  is  randomized,  say  ipG  .  They  are  defined  as  below. 
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Definition  2.3.  Given  a  number  P*,k  1  <  P*  <  1,  and  an  observation  X  =  x,  let 
k 

j  =  max{m\  £  P[.](?)  >  P*}. 

i=m 

(a)  The  nonrandomized  selection  procedure  xp G  is  defined  by  {ipG^, . . . ,  rpGk^},  where 


*&(*)  =  { 


1 

0 


if  t  >  j; 
otherwise. 


(b)  The  randomized  selection  procedure  x\}G'  is  defined  by  {^®* , . . . ,  where 

^(A:)(5)  =  1,  and  for  1  <  i  <  k  -  X, 


1  if  *  >  j) 
A  if  i  =  j; 
0  if  *  <  j ; 


the  constant  A  is  determined  so  that 


k 

APbl(?)  +  E  P[m](x)=P*. 

m~j  + 1 

It  is  clear  that  ipGeC  and  xpG  eC*.  In  the  following,  some  optimalities  of  these  two 
selection  procedures  are  investigated. 


Definition  2.4.  A  selection  procedure  rp  is  called  ordered  if  for  every  xeX,Xi  <  implies 
<  ipj{x).  It  is  called  monotone  or  just  if  for  every  1  =  1,...,*,  and  x,  yeX,  xpi(x)  < 
0i(y)  whenever  xt-  <  yt-,  Xj  >  x/j  for  any  j  ^  i. 

Some  sufficient  condition  for  xpG{xpG  )  to  be  ordered  and  monotone  are  given  below: 

Theorem  2.3.  (Gupta  and  Yang  (1985)).  Let  G[6\x)  be  the  posterior  cdf  of  6 ,  given 

X  —  x.  Let  G(0\x )  be  absolutely  continuous  and  have  the  generalized  stochastic  increasing 
property,  that  is: 
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k 

(1)  <^((?|?)  =  n  Gi{0i\x),  G,(-|x)  =  posterior  cdf  of  6{. 

i=  1 

(2)  Gi(t\x)  >  Gj(t\x)  for  any  t,  whenever  a:,-  <  Xj. 

Then,  both  xpG  and  rpG  are  ordered  and  monotone. 

Gupta  and  Yang  (1985)  also  investigated  some  optimal  behavior  of  these  two  proce¬ 
dures  through  the  decision-theoretic  approach  over  a  class  of  loss  functions. 

Definition  2.5.  A  loss  function  L  has  property  T'  if 

(a)  L  has  property  T,  and 

(b)  L(0,S)  <  L{0,S ')  if  S  C  S’. 

Theorem  2.4.  (Gupta  and  Yang  (1985)).  Under  the  assumption  of  Theorem  2.3, 
the  selection  procedure  z/)G(0G  )  is  Bayes  in  C(C*)  provided  that  the  loss  function  has 
property  T'. 

Gupta  and  Yang  (1985)  investigated  the  computation  of  pi(x)  for  the  “normal  model” 
by  using  normal  and  non-informative  priors.  Berger  and  Deely  (1986)  consider  another 
selection  problem,  and  give  a  more  detailed  discussion  about  the  computation  of  Pi(x) 
under  several  different  priors. 


3.  Empirical  Bayes  Approach 

In  this  section,  we  continue  with  the  general  setup  of  Section  2.  However,  we  assume 
only  the  existence  of  prior  distribution  G  on  fl,  and  the  form  of  G  is  unknown  or  partially 
known.  In  Section  3.1,  we  consider  decision  procedures  for  general  loss  functions.  In 
Sections  3.2  and  3.3,  only  selection  procedures  axe  concerned. 
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3.1.  Formulation  and  Summary  of  the  Empirical  Bayes  Selection  Problems 


For  each  t,  i  —  1, . . . ,  k,  let  X,-y  denote  the  random  observation  from  7r ,  at  stage  j.  Let 
denote  the  random  characteristic  of  at  stage  j.  Conditional  on  =  $i4  V-.|0tV 
has  pdf  (or  pf  in  discrete  case)  /,(x|%).  Let  Xs  =  (Xljt . . . ,  Xkj)  and  =  (0iy, . . . , 0kj). 
Suppose  that  independent  observations  Xu...,Xn  are  available  and  0j,  1  <  j  <  n,  have 
the  same  distribution  G  for  all  j,  though  not  observable.  We  also  let  X  =  (Xi, . .  .,Xk) 
denote  the  present  random  observation. 

Consider  an  empirical  Bayes  decision  procedure  dn.  Let  B(dn,G)  be  the  Bayes  risk 
associated  with  the  decision  procedure  dn.  Then 

B{dn,G)  =  fEf  £  dn({x-Xu...,Xn),S)L{0,S)f{x\0)dxdG{0), 

where  d*((x;  Xu  . . .  ,Xn),  S)(=  dn{x,S))  is  the 

probability  of  selecting  the  subset  5  when  (x;X1,...,  Xn)  is  observed,  and  the  expectation 
E  is  taken  with  respect  to  {Xu . . . ,  Xn).  Note  that  B(dn,  G)-B(G )  >  0,  since  B(G)  is  the 

minimum  Bayes  risk.  This  nonnegative  difference  is  always  used  as  a  measure  of  optimality 
of  the  decision  procedure  dn. 

Definition  3.1.  A  sequence  of  decision  procedures  {dn}~  x  is  said  to  be  asymptotically 
optimal  relative  to  the  prior  distribution  G  if  B(dn,G)  ->■  B(G)  as  n  — ►  oo. 

Let  L{0)  =  max|L(0,5)|  and  assume  that  f  L{0)dG{0)  <  oo.  Following  Robbins 
(1964),  one  can  see  that  a  sufficient  condition  for  the  sequence  {d„}  to  be  asymptotically 

optimal  is  that  dn(x,  S )  dG(x,  S)  for  all  xeX  and  SeA,  where  means  convergence 

in  probability  (with  respect  to  (Xi, . . . ,  Xn)). 
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Let  Gn  be  a  distribution  function  on  the  parameter  space  f l.  Suppose  Gn  is  a  function 
of  (Xi, . . . ,  Xn )  such  that  P{  lim  Gn(0)  =  G(0)  for  every  continuous  point  6  of  G}  =  1, 

n — >00 

where  the  probability  is  with  respect  to  (Xi, . . . , X„).  Let  the  loss  function  L(0,S')  and 
the  density  f(x\0 )  be  such  that  L(0 ,  S)f(x\9)  is  bounded  and  continuous  in  6  for  every  SeA. 
Then  {dG„}  is  asymptotically  optimal  with  respect  to  G  if  fn  L(9)dG(9)  <  oo,  where  dcn 
is  a  Bayes  procedure  with  respect  to  the  distribution  Gn. 

To  find  Gn,  we  may  assume  G  to  be  a  member  of  some  parametric  family  T  with 
unknown  hyperparameters,  say  A  =  (Ai, . . . ,  A^).  Suppose  now  an  estimator  An  = 
(Ai„,...,Afcn)  depending  on  the  previous  observations  (Xi,...,Xn)  can  be  found  such 
that  Gn  converges  to  G  with  probability  one.  Note  that  Gn  is  also  a  member  in  I\ 
We  then  follow  the  typically  Bayesian  analysis  and  derive  the  Bayes  procedure  dcn  with 
respect  to  the  estimated  prior  distribution  Gn .  Then,  according  to  the  result  of  Deely 
(1965),  the  sequence  of  empirical  Bayes  procedures  {d,Gn}  is  asymptotically  optimal.  This 
approach  is  referred  as  parametric  empirical  Bayes.  Deely  (1965)  has  derived  the  empirical 
Bayes  procedures  through  the  parametric  empirical  Bayes  approach  in  several  special  cases 
among  which  are  (a)  normal-normal,  (b)  normal-  uniform,  (c)  binomial-beta,  and  (d) 
Poisson-gamma. 

In  another  approach,  called  nonparametric  empirical  Bayes,  one  just  assumes  that 

=  1,2,...,  are  independently  and  identically  distributed;  however,  the  form  of  the 
prior  distribution  G  on  f2  is  completely  unknown.  In  this  situation,  one  may  either  estimate 
the  prior  distribution  and  then  proceed  to  a  typical  Bayesian  analysis  or  represent  the  Bayes 
procedure  in  terms  of  the  unknown  prior  and  then  use  the  data  to  estimate  the  Bayes 
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procedure  directly.  The  estimation  of  the  prior  distribution  through  the  nonparametric 
empirical  Bayes  approach  has  been  studied  (see  Simar  (1976)  for  Poisson  distribution  and 
Jewell  (1982)  for  exponential  distribution).  For  the  second  approach,  see  Van  Ryzin  (1970), 
Van  Ryzin  and  Susarla  (1977),  Gupta  an  Hsiao  (1983),  Gupta  and  Leu  (1983),  and  Gupta 
and  Liang  (1984,  1986),  among  others. 

In  the  following  sections,  we  consider  some  selection  problems  with  underlying  popu¬ 
lations  having  binomial  or  uniform  distributions.  We  will  use  the  approach  of  first  looking 
at  the  form  of  the  Bayes  procedure  and  then  estimating  the  Bayes  procedure  directly. 

3.2.  Empirical  Bayes  Procedures  Related  to  Binomial  Populations 

In  this  section,  two  selection  problems  related  to  binomial  populations  are  discussed: 

selecting  the  best  among  k  binomial  populations  and  selecting  populations  better  than 

a  standard  or  a  control.  For  each  t,  the  observations  X,  can  be  viewed  as  the  number 

of  successes  among  N  independent  trials  taken  from  7 r,-,  and  the  parameter  0t  as  the 

probability  of  a  success  for  each  trial  in  7T,-.  Then  X,|0t-  has  probability  function  = 

(^)0f(l  —  )N~x,x  =  0,1,..., N.  We  let  (7,(-)  denote  the  prior  distribution  of  0t-  and 

k 

assume  that  (7(0)  =  G»(0i)- 

i=i 
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3.2.1.  Selecting  the  Best  Binomial  Population 
Gupta  and  Liang  (1986)  considered  the  loss  function 
(3-1)  L{d,{i})=0[  k]-0i 

for  the  problem  of  selecting  the  largest  binomial  parameter  0^]  among  k  binomial  popu- 
lations. 

i  1 

Let  fi(x)  =  ffi{x\0)dGi(6),Wi{x)  =  /  0/t-(xj0)dG,  (0)  and  <pi(x)  =  Wf(z)//f(*). 
Then,  from  (3.1),  following  straightforward  computation,  a  randomized  Bayes  selection 
procedure,  say  xpB  =  (^f , . . . ,  ),  is  given  below: 

(3.2)  ^fte)  =  ( if  *eS(z)'’ 

[  0  otherwise; 

where 

(3-3)  £(?)  =  {*|lP»(*f)  =  max 

1  <j<k  3  V  3,3 

Here,  V’f  (?)  is  the  probability  of  selecting  7rt-  as  the  best  population  given  X  =  x. 

Note  that  <pi(x)  is  the  Bayes  estimator  of  the  parameter  0,  under  the  squared  error 
loss  given  X{  =  x.  One  can  see  that  cpi(x)  is  increasing  in  x  for  i  =  1, . . . , k  and  hence 
is  a  monotone  selection  procedure. 

A.  Formulation  of  the  Empirical  Bayes  Framework 

Due  to  the  surprising  quirk  that  <fi{x )  can  not  be  consistently  estimated  in  the  usual 
empirical  Bayes  sense  (see  Robbins  (1964)  and  Samuel  (1963)),  an  idea  of  Robbins  in 
setting  up  the  empirical  Bayes  framework  for  binomial  populations  is  used  below. 
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For  each  i, i  —  at  stage  j,  consider  N  +  1  independent  trials  from  7 ly.  Let 


Xij  and  Yij,  respectively,  stand  for  the  number  of  successes  in  the  first  N  trials  and  the 


last  trial.  Let  Zj  —  ((X,-y,  Yiy), . . . ,  (Xjty,  Yjty))  denote  the  observations  at  the  yth  stage, 
J  ~  1)  •  •  • ,  n.  We  also  let  Xn+i  =  X  =  (Jfj, . . . ,  X denote  the  present  observations. 

By  the  monotonicity  of  the  estimators  1  <i<k,  in  terms  of  Bayes  risks,  one  can 

see  that  all  monotone  procedures  form  an  essentially  complete  class  in  the  set  of  all  selection 
procedures.  In  view  of  this  fact,  it  is  reasonable  to  require  that  the  appropriate  empirical 
Bayes  procedures  possess  the  above  mentioned  monotone  property.  For  this  purpose,  we 
first  need  to  have  some  monotone  empirical  Bayes  estimators  for  <pi(x),  1  <  *  <  k.  Gupta 
and  Liang  (1986),  by  using  isotonic  regression  method,  proposed  two  monotone  empirical 
Bayes  estimators  for  <pi(x). 

B.  The  Proposed  Monotone  Empirical  Bayes  Selection  Procedures 
For  each  x  =  0, 1, . . . ,  N,  and  n  =  1,2,...,  define 


(3.4) 


fin(x)  =  ^I{x}(Xij)  +  n-1] 


(3.5) 


Win{x)  —  —  y, ■;/{*} [X{j)  +  n  x; 


Also,  let  V{j  =  X{j  +  3  =  1,2,...  Define 


(3.7) 
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(3.8) 


^Pxn.(^)  —  Win  (^)//tn(^)  I 


and  for  each  0  <  x  <  N y  define 


(3.9) 


i 

max  min  {]T  <Pin{y)/{t  -s  +  1)}; 
0<a<£  a<t<N 

y=a 


(3.10) 


#«(*)  =  max  mm  <Pin{y)l{t  -s  +  1)}. 

0 <8<X  8<t<N 


y=8 


By  (3.9)  and  (3.10),  one  can  see  that  both  y??n(x)  and  <pfn(x)  are  increasing  in  x. 
Gupta  and  Liang  (1986)  proposed  <Pin(x)  (  or  <Pin(x))  as  an  estimator  of  <Pi(x).  They 
also  proposed  two  empirical  Bayes  selection  procedures,  say  ipn  =  (V'ln?  ■  •  •  >  V’fcn)*  an(l 
t/>„  =  (ipi n,  ■  •  • ,  V’fcn) »  which  are  given  below,  respectively: 


(3.11) 


j,*  =  /  l5n(?)l  1  ^  ^S*(x); 

tnUl  |q  otherwise, 


where 


(3-12)  5»(?)  =  (*»)  =  max  Pjnixj)}] 

l<J<k 

and 


(3.13) 


where 


4>in{x)  = 


|5„(x)|  1  if  t'eS„(x); 
0  otherwise; 


(3.14)  Sn{x )  =  {t|#n(*i)  =  maxfc0*n(xy)}. 

Due  to  the  increasing  property  of  the  estimators  ^in(x), ^^(x),  1  <  t  <  fc,  one  can  see 
that  ipn  and  rjjn  are  both  monotone  selection  procedures. 
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Asymptotic  Optimality  of{V>*  }  and  {ipn}. 

Without  ambiguity,  we  still  use  B(ip,G)  to  denote  the  Bayes  risk  associated  with  the 
selection  procedure  0  when  G  is  the  true  prior  distribution. 

Gupta  and  Liang  (1986)  proved  that  the  two  sequences  of  selection  procedures  {ijj * } 
and  {ipn}  have  the  following  asymptotically  optimal  property: 

B(fn,G)  -  B(rPB,G)  <  0(exp(-Cln)), 

and 

B$n,G)  -  B(ipB ,  G)  <  0 (exp(-c2n)), 
for  some  positive  constants  ci  and  c 2. 

3.2.2.  Selecting  Populations  Better  Than  A  Control 

Let  0of(0, 1)  denote  a  control  parameter.  Population  7 r,-  is  said  to  be  good  if  0t-  >  $0 
and  bad  if  <  60.  Gupta  and  Liang  (1984)  considered  the  loss  function 

(3.15)  L(6,S)  =  -  «o)/(»0, i)(«i), 

ieS  igS 

for  the  problem  of  selecting  (excluding)  all  good  (bad)  populations.  For  the  loss  function 

(3.15) ,  the  first  summation  is  the  loss  due  to  selecting  some  bad  populations,  and  the 
second  summation  is  the  loss  due  to  not  selecting  some  good  populations.  The  value  of  the 
control  parameter  $o  is  either  known  or  unknown.  When  #o  is  unknown,  a  sample  from  the 
control  population,  say  7To,  is  needed.  To  be  consistent  with  the  notation  used  in  earlier 
sections,  we  assume  60  is  known.  We  note  that  Gupta  and  Liang  (1984)  have  studied  the 
case  when  6q  is  unknown. 
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For  the  loss  function  (3.15),  a  nonrandomized  Bayes  selection  procedure 


gB  =  (af , . . . ,  ccf )  is  given  by 


(3.16) 


if  >  0O ; 

otherwise, 


where  af  ( x )  is  the  probability  of  selecting  7rt-  as  a  good  population  given  X  =  x. 


Note  that  aB  is  also  a  monotone  selection  procedure.  Hence,  based  on  the  estimators 
V>in(x)  anc^  two  intuitive  empirical  Bayes  procedures,  say  a*  =  {a\n, . . . ,  akn)  and 

dn  =  (din, . . . ,  ockn)  can  be  obtained  where 


(3.17) 


and 


if  ^  0o; 

otherwise; 


(3.18) 


if  <P*in{Xi)  >  0O\ 
otherwise. 


Similarly,  one  can  show  that  these  two  sequences  of  selection  procedures  (a*  }  and 
{a„}  have  the  following  asymptotically  optimal  property: 


B{a*n,G)  ~  B(qB,G)  <  0 (exp(-c3n)), 


and 

B(an,  G)  —  B(aB ,G)  <  0(exp(— c^n)) 
for  some  positive  constants  C3  and  C4. 
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3.3.  Empirical  Bayes  Procedures  Related  to  Uniform  Populations 


In  this  section,  we  assume  that  the  random  variables  X{,  1  <  t  <  k,  have  uniform 
distributions  U(O,0t),0,  >  0  and  unknown.  The  parameter  space  is  Cl  =  {0|0t  >0,  1  <  *  < 

k 

k}.  It  is  also  assumed  that  the  prior  distribution  G  on  H  has  the  form  of  G(d )  =  []  Gt(0,), 

i~l 

where  G,(-)  is  a  distribution  on  (0,  oo),  *  =  1, . . . ,  k. 

Let  e0  >  0  be  a  known  control  parameter.  Gupta  and  Hsiao  (1983)  considered  the 
loss  function 

(3.19)  L(0,S)  =  Li  ^(0,-  —  Qo)I(e0,oo){9i)  +  L2  ^(0o  —  9i)I(o,e0)[6i), 

i&S  ieS 

where  i  =  1,2,  are  positive  and  known,  for  the  problem  of  selecting  populations  better 
than  a  standard  60. 

Let  rrii(x)  be  the  marginal  pdf  of  X{  and  Mi(x)  be  the  marginal  distribution  of 
Then,  we  have 

/oo  ^ 

gdGiie)  for  X  >  0, 

and 

/x  roo  | 

J  -dGi(6)dt  =  xrrii(x)  +  Gt(x). 

Note  that  the  marginal  pdf  m.i[x )  is  continuous  and  decreasing  in  x. 

By  direct  computation,  a  Bayes  procedure  , . . . ,  'i’k)  f°r  this  selection  prob¬ 

lem  is  given  by 

(3.22)  xbB(x)  =  [  1  if  >  0o)  or  (xt-  <  0O  and  A,G(x<)  >  0); 

1  \  0  otherwise; 
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where 


(3.23)  Aic(zi)  =  L2mi(xi)(xi  -  60 )  +  L2[Mt(0o)  -  Mt(x,)]  +  Li[l  -  Mi(90)}. 


By  the  decreasing  property  of  the  pdfs  mt-(x),  1  <t  <  k,  one  can  see  that  AtGr(x),  1  < 
i  <  k,  are  increasing  in  x  for  x  <  and  hence,  the  Bayes  procedure  rpB  has  the  monotone 
property. 


Empirical  Bayes  Procedures 


To  form  an  empirical  Bayes  procedure,  we  first  need  to  have  some  estimators,  say 
m,in(x)  and  Aftri(x),  for  m,(x)  and  M,(x),  respectively.  Due  to  the  decreasing  property 
of  m,(x),  we  require  that  the  estimators  mtn(x),n  =  1,2, ... ,  possess  the  same  property. 
Once  an  estimator  mtn  is  obtained,  we  let 


(3.24) 


min{y)dy, 


and 


(3.25) 


At'ri(x) 


=  L2min(x)(x  -  e0)  +  L2\Min(0o)  -  Min(x)\  +  Li[l  -  Mt-„(0O)]. 


Then,  an  empirical  Bayes  procedure  xjjn  —  (i/> in , . . . ,  V’fcn)  can  he  given  as  follows: 


(3.26) 


if  (xt-  >  0O)  or  (xj  <  6 o  and  A,n(xi)  >  0); 
otherwise. 


This  empirical  Bayes  procedure  %jjn  is  a  monotone  procedure  if  min(x),  1  <  i  <  k,  are 
decreasing  in  x.  We  use  the  method  of  Grenander  (1956)  to  obtain  such  an  estimator 
having  the  decreasing  property. 
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Let  X?(1)  <  ^i[2)  -  •••  ^  be  ordered  observations  of  the  first  n  observations 

taken  from  7T,-.  Let  F{n  be  the  empirical  distribution  based  on  Xu ,...,Xtn.  For  each 
j,  1  <j<n,  let 


(3.27) 


=  min  max 
8<j—l  t>j 


Ww)  -  *.<*?(.>) 


i  w _ 

A.(i)  At>) 


when  Xf(0)  =  0,  and  define 

{0  for  x  <  0; 

N  for  X"  <  x  < 

0  for  x  >  Xf{ny 

From  (3.27)  and  (3.28),  one  can  see  that  the  estimator  mtri(x)  is  decreasing  in  x.  Thus, 
the  empirical  Bayes  procedures  defined  by  (3.24  ~  3.28)  is  a  monotone  procedure.  It  is 
known  that  both  estimators  M{n(x)  and  min(x)  have  strong  consistency  property.  Hence, 
Ain{x)  is  a  strongly  consistent  estimator  of  AlC?(x).  Then  by  Theorem  2.1  of  Gupta  and 
Hsiao  (1983),  the  sequence  of  empirical  Bayes  procedures  {tfj n }  is  asymptotically  optimal 
provided  /0°°  6dGi{0)  <  oo  for  each  t  =  1, . . . ,  k. 


3.4.  Remark  on  the  Monotonicity  of  Empirical  Bayes  Selection  Procedures 

The  monotonicity  of  selection  procedures  is  an  important  property  in  many  selection 
problems.  Under  some  regularity  conditions,  Miescke  (1979)  showed  that  every  Bayes 
procedure  is  monotone.  Hence,  the  class  of  all  monotone  selection  procedures  form  an 
essentially  complete  class  among  the  class  of  all  selection  procedures.  In  other  words,  a 
non- monotone  selection  procedure  is  always  inadmissible  in  terms  of  the  Bayes  risk. 

Generally,  the  monotonicity  of  a  Bayes  selection  procedure  is  due  to  the  monotonic¬ 
ity  of  the  posterior  expectation  of  loss  functions  (or  functions  related  to  loss  functions). 
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Therefore,  in  the  empirical  Bayes  selection  problems,  one  of  the  most  important  things  is 
to  construct  monotone  estimators  for  each  related  monotone  function. 

The  techniques  to  construct  monotone  empirical  Bayes  estimators  have  been  studied 
by  van  Houwelingen  (1976,  1977)  for  continuous  one-parameter  exponential  family  and 
also  for  a  class  of  discrete  distributions  with  monotone  likelihood  ratio  property.  Stijnen 
(1982,  1985)  and  van  Houwelingen  and  Stijnen  (1983)  have  studied  the  same  problem  for 
the  continuous  one-parameter  exponential  family.  Those  techniques  can  (only)  be  applied 
to  selection  problems  with  underlying  distributions  being  in  the  above  mentioned  family  of 
distributions.  Further  studies  are  needed  to  investigate  the  asymptotic  behavior  for  each 
related  empirical  Bayes  selection  procedure.  The  present  authors  are  planning  to  do  some 
work  along  these  lines. 
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